Experimental Setup
To answer our research questions, we will use a multi-treatment survey experiment, that tests the efficacy of different messages at the state and federal levels.
Specifically, we will use seven different messages with questions on confidence (the dependent variable) both pre and post test. We will have one control group and six different messages to use on treatment groups. The control group will receive a message about election integrity, with no mention of RLAs. The six treatment groups will each receive a different message about RLAs of similar length and reading level.
Our pre-test questions will gather covariates to get a sense of participant’s trust in government and civic engagement. Participants will first answer survey questions about their confidence in the accurarcy of vote count elections for our pre-test dependent variables (state and federal levels). Next, we will gather covariates around the participant’s level of trust in government at the state and federal levels, and trust in recent presidential elections. Then, we will gather demographic information on age, gender, parenthood, and state of residence.
After gathering information on the aforementioned covariates, respondents will receive different information based on their treatment group. Those in the control group will receive information about the importance of elections. No information on RLAs will be present. The remaining six groups will move from the covariate questions to a screen giving a general overview on RLAs. From that point, they will move to another page, which will contain their actual treatment message.
After receiving the treatment, all participants will be asked post-test questions on their confidence in state and national elections. They will then be asked a question on how the information received in the survey changed their confidence on the accuracy of the vote count. Of note, this will be a secondary variable that will be used as a robustness check on our primary dependent variable. After asking information on the dependent variable and secondary dependent variable, we will ask additional questions on covariates like political ideology, race, citizenship, education, civic engagement. These will be useful covariates to analyze our data, gauge political attitudes, and measure the effect of messages for subgroups. By asking these questions after the treatment, we aim to reduce the potential priming that could occur by associating with a particular political ideology or party. This design is a between-subject design, where interventions are implemented at the respondent level.
Sample recruitment
Participants will be recruited through Lucid during the first half of May 2021, which allows us to target a representative sample of US adults (18 years old and up). Our target sample size is 200 respondents for each treatment group, totaling 1600 respondents across all 50 states.
Apart from Colorado, several other states have recently or are planning to use RLAs in the future and the goal of using a nationally representative sample is to gain generalizable insights about the potential benefits of well-communicated information about RLAs on voter confidence in the election.
Treatment
Drawing on the literature on experimental interventions around election processes, we use a treatment message designed to explain how RLAs work at the respondent level. This will allow us to measure if receiving information increases confidence in elections. The treatment messages use rational and emotional appeals, grounded in real-world analysis around trust in elections (e.g. that voters trust their local election officials). The treatments are presented in Table 1.
All respondents are assigned either the Control message or one of the six treatment messages. The messages are all of similar length and reading level. The treatment process is detailed in Figure 1.
Table 1
| 1. Bipartisan Auditors |
Respondent |
Prompt: Did you know? The RLA process is balanced. Republican and Democrat representatives audit your state’s election system. This means that, when a random sample of votes is being checked for any miscounts, representatives from both parties review the process and sign the final results. |
| 2. Risk Limit as a Percentage |
Respondent |
Prompt: Did you know? Before starting an audit, election officials agree on a risk limit (9% in this case). The lower the risk limit, the more ballots need to get recounted to verify the election result. If the originally reported election result is wrong, then, a risk limit of 9% means that there’s a 91% chance the RLA will find the mismatches and correct the result. |
| 3. Hand-Counted Votes |
Respondent |
Prompt: Did you know? The RLA consists of auditors examining a random sample of ballots by hand and comparing them to how the machine counted those ballots. This hand recount ensures that the machine accurately captured every voters intent. Examining a random selection of ballots by hand allows election officials and the public to be confident that the election results reflect the will of the people. |
| 4. Emotional - Politicians are Bad Losers |
Respondent |
Prompt: Did you know? Every four years, the losing politician has only one move left to make – cast doubt on the election process. The American people are tired of this bullshit! Your state election officials use RLAs to ensure that the lies spewed by losing politicians are just that - empty words spoken by losers. |
| 5. Random Sample (Soup Analogy) |
Respondent |
Prompt: Did you know? Election officials use a random sample of ballots when they audit election results. But, how can a random sample of ballots confirm the results for the entire state? Well, it’s similar to tasting a giant pot of soup to see if it’s too salty. You stir the pot thoroughly and take a random sample. You don’t have to eat the whole bowl to know if it is good or not - just a random spoonful of soup. |
| 6. RLAs are Run by Local Election Officials |
Respondent |
Prompt: Did you know? The people who run the RLAs - the local election officials - are people like you; they can be your neighbors, friends, those you see at the store, soccer games, church, etc. These officials are the ones responsible for auditing the election results. The election system was designed to be run by local communities with multiple security checks and audits. |
| 7. Control |
Respondent |
Prompt: Did you know? Elections are at the heart of democracy. Voting in an election gives the people a way to choose who governs them. The election process provides a way for power to be transferred peacefully. It is important that people who work in elections serve the American people to protect our democracy. |
Table 1. Description of interventions included in the experiment
Primary Outcomes
We will use a pre-test/post-test design to address our primary research question:
- Does communicating about RLAs increase trust in election system for the US adult population?
Prior to treatment, respondents will be asked to rate on a scale (1-10) their confidence in election processes at the state and national level. After answering this question, along with select covariates, respondents will be randomly assigned to either the Control or one of the six treatment groups to read a message. If treated, they will receive a message about RLAs, whereas the Control group will receive a message about elections but with no reference to RLAs. Respondents will then be asked to answer the same question about confidence in election processes at the state and national level, along with other covariates that are related to elections (e.g. party affiliation).
The primary outcome we are interested in is determining a change in confidence in election processes after being treated. We will measure the dependent variables by taking the difference in means between the pre- and post-test dependent variables, measured on a scale of 1-10, to see if there is a noticeable change between the Control and treatment messages.
By running a two-sided T-test, we will be able to determine if talking about RLAs positively or negatively impacts voter confidence. Through this, we will be able to compare the treatment groups to the Control based on the average change in pre- and post-test ranking about confidence in voting at the state and national level. For a more thorough analysis, we will run a linear regression, controlling for covariates, and analyze the effect of of being treated versus being in the control group.
The survey questions that will allow us to measure this change are two-fold to account for differences in confidence at the state versus national level, all measured on a scale (1-10).
Pre-Treatment Questions
How confident are you that votes in your state will be counted accurately in the upcoming 2024 presidential election?
How confident are you that votes across all states will be counted accurately in the upcoming 2024 presidential election?
Post-Treatment Questions
Given what you know about election processes, how confident are you that votes in your state will be counted accurately in the upcoming 2024 presidential election?
Given what you know about election processes, how confident are you that votes across all states will be counted accurately in the upcoming 2024 presidential election?
The pre-test/post-test design will allow us to measure the average treatment effect compared to the Control, whether positive or negative.
Because of random assignment, we expect to see no systematic differences in pre-test confidence levels across treatment groups, conditional on covariates.
Secondary Outcomes
In addition, we measure secondary outcomes by comparing the effectiveness of each treatment against each other to see which message is more effective. We will run a multiple hypothesis test between each treatment group to determine this marginal treatment effect. We will also examine the effect for particular sub-groups.
The secondary research questions we are most interested in is:
- Which messages about RLAs are the most effective at increasing trust in elections?
To explore this question, we will compare the treatment groups to each other to determine which message is most effective. We will pursue this by using a linear regression model to compare each message’s coefficient with that of the Control.
Finally, after receiving the treatment or Control, respondents will be asked the following question measured on a scale (1-5):
- To what extent did receiving the information on election processes change your trust in your State’s ability to count ballots?
This question is intended to capture a secondary effect from being treated (or not treated). We will use this secondary effect as the outcome variable in a linear regression to determine respondents self-reported change in confidence.
Covariates
The existing literature on election integrity focuses on voter confidence in specific elections (Atkeson et al., 2015), the role of partisanship (Sances et. al, 2015), and factors that can drastically affect voter experiences like poll workers (Hall et al., 2009). Our study contributes to the existing literature by exploring what messages are most effective at explaining RLAs and building trust and confidence in election systems. What messages are most salient with US adults to understand and trust that their vote was accurately counted?
In addition to the demographic covariates such as age, race, gender, and education, we also include specific questions regarding party affiliation, political ideology, and trust in state and federal government. This includes a control for people with strong or unwavering views on election integrity for us to measure the treatment effect.
These variables capture what might be sources of heterogeneity in responses to RLAs and election integrity: age, education, religious views, and political ideology, and trust in government. We will control for these covariates in the primary regression, and will examine the heterogeneous treatment effects based on sub-groups, such as age, race, and gender. Results from collecting these covariates could help our partner and other organizations target messaging based on the population.
Hypotheses
Primary:
The primary hypothesis that we are interested in testing the change in voter confidence after the intervention to measure whether any of the treatments to communicate about RLAs are effective in increasing voter trust in election systems. This compares the treatment to the Control group.
We hypothesize that being treated with any of the six messages will have a positive effect compared to the Control group. In addition, given that we are measuring confidence in state and national elections, we expect the treatment effect will be smaller at the national level than state level given prior research showing that voters trust elections in their vicinity.
Secondary (Hypotheses to inform industry practice) Our secondary hypothesis will measure which of the six messages were most effective, comparing the treatments to each other. We expect that learning about RLAs will have at least a small positive effect on trust in election systems, and that the Bipartisan message and Bad Loser, the emotional appeal, will have the largest effects compared to the other treatments. In addition, we anticipate the message about RLA as a percentage will have a neutral or negative effect on confidence in election processes.
We care about the outcome (increasing voter confidence) and understanding which types of people are nudged toward this outcome by particular treatments. Therefore, we plan to examine how a few select treatments interact with particular covariates of interest, namely age, gender, party affiliation, income, and education.
For low-income and low-educated participants, we expect the message about RLAs with the percentage will have a negative effect. For those that identify as Republicans, the message “Bad Losers” will be less effective. Respondents that identify as Republicans will be less affected overall by the treatments than those identifying as Democrats. In addition, respondents with low trust in the government (state or national) will be effected less than those with high trust in the government.
We designed treatments based one existing research on what increases, or decreases, confidence in election processes. The below covariates were selected based on what surveys and social media companies collect, which could be used for targeting interventions. For our covariates of interest, we will divide these into two groups for binary variables (i.e. female, race, parent, eduction). We will split on the median value for continuous variables (i.e. age).
Treatments (respondent-level):
- Bipartisan Auditors
- Risk Limit as a Percentage
- Hand-Counted Votes
- Politicians as Bad Losers
- Random Sample Soup Analogy
- Local Election Officials
- Control
Covariates:
- Age
- Gender
- Race
- Income
- Education
- High school education
- College education and above
- Political party
- Parenthood
- Trust in state and federal government
- State of residency
Analysis Scripts
# # Ensure that pacman is installed for package management and loading.
# if (!require("pacman")) install.packages("pacman")
# # for data reading wrangling and visualization
pacman::p_load(tidyverse)
# for working directories
pacman::p_load(here)
# # for cross tabulation and data cleaning
# pacman::p_load(janitor)
# for working with strings
pacman::p_load(glue)
# For randomized inference, also loads randomizr and estimatr
pacman::p_load(ri2)
# for marginal effects from lineal regressions
pacman::p_load(margins)
# Tests for linear regression models
pacman::p_load(lmtest)
pacman::p_load(car)
# Tables
pacman::p_load(kableExtra)
# for updated ggplot2 theme
pacman::p_load(hrbrthemes)
# for updated ggplot2 colorblind-friendly scheme
pacman::p_load(ggthemes)
# theme_set(hrbrthemes::theme_ipsum())
pacman::p_load(reshape2)
# for plotting of covariate balance
pacman::p_load(cobalt)
# for matching only
pacman::p_load(MatchIt)
library(stargazer)
library(usmap)
library(ggplot2)
library(maps)
#install.packages("mapproj")
library(mapproj)
#install.packages("viridis")
library(viridis)
library(dplyr)
library(ggthemes)
library(sjPlot)
library(sjmisc)
library(here)
df <- read.csv(here('', 'rla_clean_5_12.csv'))
The above code will read in our dataset.
df$age <- as.numeric(df$age)
# exclude data with missing covariates
no_na_df <- df %>% drop_na('age', 'state', 'income_clean', 'party', 'gender_female', 'parent_yes', 'race_hispanic', 'edu_4college', 'edu_hs')
love.plot(treatment_group_num ~ age + gender_female + parent_yes + state + income_clean + race_hispanic + edu_4college + edu_hs + dv_pre_state_conf + dv_pre_national_conf + party,
data = no_na_df,
binary = "std",
limits = c(-.5, .5),
thresholds = c(m = .1)) # threshold determines the dotted lines on the graph
m.out <- matchit(dummy_treat ~ age + gender_female + parent_yes + state + income_clean + race_hispanic + edu_4college + edu_hs + dv_pre_state_conf + dv_pre_national_conf + party,
data = no_na_df,
method = "nearest", replace = TRUE) # perform matching
# plot covariate balance with and without matching
love.plot(m.out, binary = "std", thresholds = c(m = .1))
The above code will test our treatment groups to ensure they are balanced. Here, we are testing to ensure there is no statisical bias between groups. In other words, we want to make sure that men and women were approximately equally distributed amongst our treatment groups.
# people within 2 SD of duration
sd2 = sd(df$duration_sec)*2
duration_sd2_df <- no_na_df %>%
filter(abs(duration_sec - mean(duration_sec)) < sd2)
# check to see how many people are removed
rm(duration_sd2_df)
# people who passed attention check
atten_check_df <- no_na_df %>%
filter(attention_pass == 1)
love.plot(treatment_group_num ~ age + gender_female + parent_yes + state + income_clean + race_hispanic + edu_4college + edu_hs + dv_pre_state_conf + dv_pre_national_conf + party,
data = atten_check_df,
binary = "std",
limits = c(-.5, .5),
thresholds = c(m = .1)) # threshold determines the dotted lines on the graph
m.out <- matchit(dummy_treat ~ age + gender_female + parent_yes + state + income_clean + race_hispanic + edu_4college + edu_hs + dv_pre_state_conf + dv_pre_national_conf + party,
data = atten_check_df,
method = "nearest", replace = TRUE) # perform matching
# plot covariate balance with and without matching
love.plot(m.out, binary = "std", thresholds = c(m = .1))
The above code will analyze the impact of keeping those respondents who failed the attention check in our data frame. If the impact is negligible, we plan to keep the attention check failures in the dataset to allow for a larger sample size.
# regression of covariates on treatment assignment variable
balance_lm <- lm(treatment_group_num ~ age + gender_female + parent_yes + state + income_clean + race_hispanic + edu_4college + edu_hs + party + dv_pre_state_conf + dv_pre_national_conf, data = no_na_df) # factor(region) encodes the string variable as a factor for analysis
# Test whether all coefficients from the balancce_lm regression are equal to zero
# using heteroskedasticity-robust standard errors, denoted by hc2
car::linearHypothesis(balance_lm, c("gender_female = 0", "age = 0", "parent_yes = 0", "race_hispanic = 0", "edu_hs = 0", "dv_pre_state_conf = 0", "dv_pre_national_conf = 0"),
test = "F", white.adjust = "hc2", singular.ok = TRUE)
summary(balance_lm)
The above code uses a linear regression to conduct similar balance checks to those above. This is used as a robustness check.
# Define a function to generate a balance check table using two-sample t-test
# Parameters: "cov_list" includes covariates, "treat" indicates the treatment, "alpha" is the significance level
t_table <- function(data, cov_list, treat,
alpha = 0.05) {
# for each covariate in cov_list, apply a function to conduct the t-test
out <- lapply(cov_list, function(x) {
tres <- t.test(data[[x]] ~ data[[treat]]) # t-test result between control and treatment groups' covariate x
# capture the group means for both control and treatment, and the p-value
c(mean_control = as.numeric(tres$estimate[1]), mean_treat = as.numeric(tres$estimate[2]), p_value = tres$p.value)
})
# save results by binding the results for all above covariates in cov_list by row (rbind)
out <- as.data.frame(do.call(rbind, out))
# combine covaraite names and results by column (cbind)
out <- cbind(covariate = cov_list, out)
# get rid of non-important strings in the names
names(out) <- gsub("[^0-9A-Za-z_]", "", names(out)) # gsub(pattern, replacement, x) replaces pattern in x with replacement
# code presentation of p-value according to whether they are below a threshold, for e.g., 0.001
out$p_value <- ifelse(out$p_value < 0.001,
"<0.001",
round(out$p_value, 3) # rounding of p values to 3 decimal places
)
return(out)
}
# Apply above t-t_table function to our dataset and selected covariates
balance_table <- t_table(
data = df,
c("gender_female", "parent_yes", "race_hispanic", "race_white", "race_asian", "race_black", "edu_hs", "edu_4college", "trust_federal", "trust_state"),
"dummy_treat"
)
# output the balance table
balance_table %>%
knitr::kable() %>%
kableExtra::kable_styling(bootstrap_options = "striped")
The above code is also looking at the balance between treatment groups, using a two sample t-test.
# balance table for Democrats
balance_dems <- t_table(
data = subset(df, party=="Democrat"), # subsetting data to those with "party" equal to "Democrat"
c("gender_female", "parent_yes", "race_hispanic", "race_white", "race_asian", "race_black", "edu_hs", "edu_4college", "trust_federal", "trust_state"),
"dummy_treat"
)
# balance table for Republicans
balance_repubs <- t_table(
data = subset(df, party=="Republican"), # subsetting data to those with "party" equal to "Republican"
c("gender_female", "parent_yes", "race_hispanic", "race_white", "race_asian", "race_black", "edu_hs", "edu_4college", "trust_federal", "trust_state"),
"dummy_treat"
)
balance_indpt_other <- t_table(
data = subset(df, party=="Independent" | party == "Other Party"), # subsetting data to those with "party" equal to "Republican"
c("gender_female", "parent_yes", "race_hispanic", "race_white", "race_asian", "race_black", "edu_hs", "edu_4college", "trust_federal", "trust_state"),
"dummy_treat"
)
# combine the 4 t-tables, excluding first column of covariate names for the latter 3, so that only one covariate name column is there
balance_party = cbind(balance_dems, balance_repubs %>% select(2:4), balance_indpt_other %>% select(2:4))
# Format: adding in headers
x <- knitr::kable(balance_party, digits = 2) %>% kable_styling()
# add in a header to label what we're cross-tabulating with
add_header_above(x, c('', 'Democrats'=3, 'Republicans'=3, 'Other and Independent'=3))
Above, we check for balance among political parties.
df[c(6:10,12,19:21,23,27,29,30,32,35:42,44,45)] <- lapply(df[c(6:10,12,19:21,23,27,29,30,32,35:42,44,45)], factor)
atten_check_df$treatment_group <- relevel(factor(atten_check_df$treatment_group), ref = "Control")
# run regressions to estimate treatment effect with robust standard errors
# run the regression for state confidence level
reg1 <- lm(dv_post_state_conf ~ dummy_treat + dv_pre_state_conf, data = df)
# run the regression for national confidence level
reg2 <- lm(dv_post_national_conf ~ dummy_treat + dv_pre_national_conf, data = df)
# run the regression with controls for state level
reg3 <- lm(dv_post_state_conf ~ dummy_treat + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_state, data = df)
# run the regression with controls for national level
reg4 <- lm(dv_post_national_conf ~ dummy_treat + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_federal, data = df)
# compare with those who did not fail attention check
reg1_att <- lm(dv_post_state_conf ~ dummy_treat + dv_pre_state_conf, data = atten_check_df)
reg2_att <- lm(dv_post_national_conf ~ dummy_treat + dv_pre_national_conf, data = atten_check_df)
reg3_att <- lm(dv_post_state_conf ~ dummy_treat + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_state, data = atten_check_df)
reg4_att <- lm(dv_post_national_conf ~ dummy_treat + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_federal, data = atten_check_df)
# do results vary by those who failed the attention check?
# compare by controlling for attention check
reg1_att_con <- lm(dv_post_state_conf ~ dummy_treat + dv_pre_state_conf + attention_pass, data = df)
# do results look the same?
# run the regression for national confidence level
reg2_att_con <- lm(dv_post_national_conf ~ dummy_treat + dv_pre_national_conf + attention_pass, data = df)
# do the results look the same?
# run the regression with controls for state level
reg3_att_con <- lm(dv_post_state_conf ~ dummy_treat + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + attention_pass + trust_state, data = df)
# do results look the same?
# run the regression with controls for national level
reg4_att_con <- lm(dv_post_national_conf ~ dummy_treat + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + attention_pass + trust_federal, data = df)
# do the results look the same?
The code above contains important elements of our analysis. First, it will run the state and national level regressions to determine the effect of being treated versus controlled. The initial regressions do not control for any covariates. Then it will run a similar regression, this time with controls in place. These initial regressions do not include respondents that fail the attention check. Then, it runs these same regressions including the group that failed the attention check and will assess whether or not there is a statistically significant difference. This chunk of code will test our first hypothesis, determining whether being treated (despite treatment message) is different than not being treated.
# run the regression with treatment groups
df$treatment_group <- relevel(factor(df$treatment_group), ref = "Control")
#state level with treatment groups
reg5 <- lm(dv_post_state_conf ~ treatment_group + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_state, data = df)
# national with treatment groups
reg6 <- lm(dv_post_national_conf ~ treatment_group + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_federal, data = df)
# compared with those who did not fail the attention check
reg5_att <- lm(dv_post_state_conf ~ treatment_group + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_state, data = atten_check_df)
# are results different?
reg6_att <- lm(dv_post_national_conf ~ treatment_group + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_federal, data = df)
# are results different?
# compare controlling for the attention check
#state level with treatment groups
reg5_att_con <- lm(dv_post_state_conf ~ treatment_group + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + attention_pass + trust_state, data = df)
# are results different?
# national with treatment groups
reg6_att_con <- lm(dv_post_national_conf ~ treatment_group + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + attention_pass + trust_federal, data = df)
# are results different?
# we can get rid of the extra regressions now
rm(reg1_att, reg1_att_con,reg2_att, reg2_att_con,reg3_att, reg3_att_con, reg4_att, reg4_att_con,reg5_att, reg5_att_con,reg6_att, reg6_att_con)
rm(atten_check_df)
Like the previous block of code, this code contains analysis important to our experiment. Namely, it tests our second hypothesis for differences in treatment effects by message. The regressions are run at the state and national level with controls. Again, we plan to compare the respondents with and without attention check failures.
# run a linear regression to measure the effect of treatments on the secondary effect question
reg_state_seceffect <- lm(seceffect ~ treatment_group + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_state, data = df)
The above code is the linear regression for our secondary dependent variable, which will be used as a robustness check.
labs <- c("Bipartisan", "Hand Count", "Local", "Loser", "RLA Percentage", "Soup")
# for the state level with controls
lm_state <- bind_rows(list(tidy(reg5))) %>%
filter((term %in% c("treatment_groupControl", "treatment_groupBipartisan", "treatment_groupHandcount", "treatment_groupLocal", "treatment_groupLoser", "treatment_groupRL_percentage", "treatment_groupSoup")))
p4 <- ggplot(lm_state, aes(x = term,y =estimate)) +
geom_point(position=position_dodge(width=0.5)) +
geom_errorbar(aes(x=term,ymin = estimate - 1.96 * std.error,ymax = estimate + 1.96 * std.error),width = .1,position=position_dodge(width=0.5), color = "steelblue2") +
xlab('Treatment') +
ylab('Estimate') +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, vjust = 1, hjust =1)) +
geom_hline(yintercept = 0, colour = 'grey60', linetype = 2) +
coord_cartesian(ylim=c(-.75, .75)) +
ggtitle('Post-Test State Outcome w/ controls (conditioned on pre-test)') +
scale_x_discrete(labels = labs)
# for the national level
lm_national <- bind_rows(list(tidy(reg6))) %>%
filter((term %in% c("treatment_groupControl", "treatment_groupBipartisan", "treatment_groupHandcount", "treatment_groupLocal", "treatment_groupLoser", "treatment_groupRL_percentage", "treatment_groupSoup")))
# plot treatment effect
p5 <- ggplot(lm_national, aes(x = term,y =estimate)) +
geom_point(position=position_dodge(width=0.5)) +
geom_errorbar(aes(x=term,ymin = estimate - 1.96 * std.error,ymax = estimate + 1.96 * std.error),width = .1,position=position_dodge(width=0.5), color = "firebrick") +
xlab('Treatment') +
ylab('Estimate') +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, vjust = 1, hjust =1)) +
geom_hline(yintercept = 0, colour = 'grey60', linetype = 2) +
coord_cartesian(ylim=c(-.75, .75)) +
ggtitle('Post-Test National Outcome w/ controls (conditioned on pre-test)') +
scale_x_discrete(labels = labs)
# for the state level secondary effect with controls
lm_state_sec <- bind_rows(list(tidy(reg_state_seceffect))) %>%
filter((term %in% c("treatment_groupControl", "treatment_groupBipartisan", "treatment_groupHandcount", "treatment_groupLocal", "treatment_groupLoser", "treatment_groupRL_percentage", "treatment_groupSoup")))
sec_effect <- ggplot(lm_state_sec, aes(x = term,y =estimate)) +
geom_point(position=position_dodge(width=0.5)) +
geom_errorbar(aes(x=term,ymin = estimate - 1.96 * std.error,ymax = estimate + 1.96 * std.error),width = .1,position=position_dodge(width=0.5), color = "steelblue2") +
xlab('Treatment') +
ylab('Estimate') +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, vjust = 1, hjust =1)) +
geom_hline(yintercept = 0, colour = 'grey60', linetype = 2) +
coord_cartesian(ylim=c(-.5, 1)) +
ggtitle('Secondary Effect State Outcome w/ controls (conditioned on pre-test)') +
scale_x_discrete(labels = labs)
p4
p5
sec_effect
The code above will plot the treatment effects.
# simple difference in means for those treated with a message
with(df, mean(dv_post_state_conf[dummy_treat == 1]) - mean(dv_post_state_conf[dummy_treat == 0])) #with(data, expr, …) evaluates the expr on the data
with(df, mean(dv_post_national_conf[dummy_treat == 1]) - mean(dv_post_national_conf[dummy_treat == 0])) #with(data, expr, …) evaluates the expr on the data
The code above will run a simple t-test to determine whether or not being treated has an effect.
# Use the estimatr package for difference in means estimates
estimatr::difference_in_means(dv_state_treatment_diff ~ dummy_treat, data = df)
# difference_in_means(y ~ x, data) computes mean(y when x==1) - mean(y when x==0) in data, along with standard errors and p-values from two-sided t-tests
estimatr::difference_in_means(dv_national_treatment_diff ~ dummy_treat, data = df)
This code will also run a t-test to determine whether or not being treated has an effect.
# difference in means for women at state level
df$gender_female <- to_factor(df$gender_female)
estimatr::difference_in_means(dv_state_treatment_diff ~ dummy_treat, data = df, subset = gender_female == "1")
# difference in means for women at national level
estimatr::difference_in_means(dv_national_treatment_diff ~ dummy_treat, data = df, subset = gender_female == "1")
# difference in means for age at state level
df$age <- to_numeric(df$age)
# age < 40
estimatr::difference_in_means(dv_state_treatment_diff ~ dummy_treat, data = df, subset = age < 40)
# age > 40
estimatr::difference_in_means(dv_state_treatment_diff ~ dummy_treat, data = df, subset = age > 40)
# difference in means for age at national level
# age > 40
estimatr::difference_in_means(dv_national_treatment_diff ~ dummy_treat, data = df, subset = age > 40)
# difference in means for HS education at state level
estimatr::difference_in_means(dv_state_treatment_diff ~ dummy_treat, data = df, subset = edu_hs == "1")
# difference in means for HS education at national level
estimatr::difference_in_means(dv_national_treatment_diff ~ dummy_treat, data = df, subset = edu_hs== "1")
# difference in means for college education at state level
estimatr::difference_in_means(dv_state_treatment_diff ~ dummy_treat, data = df, subset = edu_4college == "1")
# difference in means for college education at national level
estimatr::difference_in_means(dv_national_treatment_diff ~ dummy_treat, data = df, subset = edu_4college == "1")
The code above will run a t-test for many of the heterogeneous treatment effects for which we have interest.
# Gender interaction
#state level interaction for gender
reg_state_fem <- lm(dv_post_state_conf ~ treatment_group*gender_female + dv_pre_state_conf + age + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_state, data = df)
# national level interaction for gender
reg_natl_fem <- lm(dv_post_national_conf ~ treatment_group*gender_female + dv_pre_national_conf + age + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_federal, data = df)
The code above will create a linear regression with interaction between gender and treatement effect, which will be used to determine heterogeneous treatment effects based on gender.
# Age interaction
#state level
reg_state_age <- lm(dv_post_state_conf ~ treatment_group*age + gender_female + dv_pre_state_conf + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_state, data = df)
# national level
reg_natl_age <- lm(dv_post_national_conf ~ treatment_group*age + gender_female + dv_pre_national_conf + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_federal, data = df)
The code above will create a linear regression with interaction between age and treatement effect, which will be used to determine heterogeneous treatment effects based on age.
# Education interaction
# state only HS
reg_state_edu_hs <- lm(dv_post_state_conf ~ treatment_group*edu_hs + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + party + trust_state, data = df)
# national only HS
reg_natl_edu_hs <- lm(dv_post_national_conf ~ treatment_group*edu_hs + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + party + trust_federal, data = df)
# state college or above
reg_state_edu <- lm(dv_post_state_conf ~ treatment_group*edu_4college + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_hs + party + trust_state, data = df)
# national college or above
reg_natl_edu <- lm(dv_post_national_conf ~ treatment_group*edu_4college + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_hs + party + trust_federal, data = df)
The code above will create a linear regression with interaction between education and treatement effect, which will be used to determine heterogeneous treatment effects based on education.
# income interaction
# state
reg_state_inc <- lm(dv_post_state_conf ~ treatment_group*income_num + dv_pre_state_conf + age + gender_female + parent_yes + state + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_state, data = df)
# national
reg_natl_inc <- lm(dv_post_national_conf ~ treatment_group*income_num + dv_pre_national_conf + age + gender_female + parent_yes + state + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_federal, data = df)
The code above will create a linear regression with interaction between income and treatement effect, which will be used to determine heterogeneous treatment effects based on income.
# always vs never believer interaction
df$always_believer <- to_factor(df$always_believer)
df$never_believer <- to_factor(df$never_believer)
# state never believer
reg_state_never <- lm(dv_post_state_conf ~ treatment_group*never_believer + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_state, data = df)
# national never believer
reg_natl_never <- lm(dv_post_national_conf ~ treatment_group*never_believer + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_federal, data = df)
# state always believer
reg_state_always <- lm(dv_post_state_conf ~ treatment_group*always_believer + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + trust_state, data = df)
# national always believer
reg_natl_always <- lm(dv_post_national_conf ~ treatment_group*always_believer + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_federal, data = df)
The code above will create a linear regression with interaction between ‘always believers & never believers’ and treatement effect, which will be used to determine heterogeneous treatment effects based on belief in the government.
# political party interaction
# state level
df$party <- to_factor(df$party)
reg_state_party <- lm(dv_post_state_conf ~ treatment_group*party + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + trust_state, data = df)
# national level
reg_natl_party <- lm(dv_post_national_conf ~ treatment_group*party + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + trust_federal, data = df)
The code above will create a linear regression with interaction between political party and treatement effect, which will be used to determine heterogeneous treatment effects based on political party.
# race interaction
# state level race_hispanic Y/N
reg_state_hisp <- lm(dv_post_state_conf ~ treatment_group*race_hispanic + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_state, data = df)
# national level race_hispanic Y/N
reg_natl_hisp <- lm(dv_post_national_conf ~ treatment_group*race_hispanic + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_white + race_black + race_asian + edu_4college + edu_hs + party + trust_federal, data = df)
# state level race_white
reg_state_white <- lm(dv_post_state_conf ~ treatment_group*race_white + race_hispanic + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_black + race_asian + edu_4college + edu_hs + party + trust_state, data = df)
# national level race_white
reg_natl_white <- lm(dv_post_national_conf ~ treatment_group*race_white + race_hispanic + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_black + race_asian + edu_4college + edu_hs + party + trust_federal, data = df)
# state level race_black
reg_state_black <- lm(dv_post_state_conf ~ treatment_group*race_black + race_hispanic + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_white + race_asian + edu_4college + edu_hs + party + trust_state, data = df)
# national level race_black
reg_natl_black <- lm(dv_post_national_conf ~ treatment_group*race_black + race_hispanic + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_white + race_asian + edu_4college + edu_hs + party + trust_federal, data = df)
The code above will create a linear regression with interaction between race and treatement effect, which will be used to determine heterogeneous treatment effects based on race.
# state level middle views ( > 1 and < 10)
reg_middle_views_state <- lm(dv_post_state_conf ~ treatment_group + dv_pre_state_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party, data = df, subset = (dv_pre_state_conf < 10 & dv_pre_state_conf > 1))
# national middle views ( > 1 and < 10)
reg_middle_views_natl <- lm(dv_post_national_conf ~ treatment_group + dv_pre_national_conf + age + gender_female + parent_yes + state + income_clean + race_hispanic + race_white + race_black + race_asian + edu_4college + edu_hs + party, data = df, subset = (dv_pre_state_conf < 10 & dv_pre_state_conf > 1))
The code above will create a linear regression without including those respondents in the extremes (respondents who answered either a 1 or 10). We can use this regression to analyze whether or not those respondents not in the extremes have different treatment effects than those in the extremes.